Apparatus and method for speech recognition

ABSTRACT

Disclosed is an apparatus for speech recognition and automatic translation operated in a PC or a mobile device. The apparatus for speech recognition according to the present invention includes a display unit that displays a screen for selecting a domain as a unit for a speech recognition region previously sorted for speech recognition to a user; a user input unit that receives a selection of a domain from the user; and a communication unit that transmits the user selection information for the domain. According to the present invention, the apparatus for speech recognition using an intuitive and simple user interface is provided to a user to enable the user to easily select/correct a designation domain of a speech recognition system and improve accuracy and performance of speech recognition and automatic translation by the designated system for speech recognition.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to and the benefit of Korean PatentApplication No. 10-2012-0046426 filed in the Korean IntellectualProperty Office on May 2, 2012 and the Korean Patent Application No.10-2012-0118892 filed in the Korean Intellectual Property Office on Oct.25, 2012, the entire contents of which are incorporated herein byreference.

TECHNICAL FIELD

The present invention relates to an apparatus equipped with a speechrecognition function and an automatic interpretation function or amethod for speech recognition, and more particularly, to a method ofselecting a domain of a database for speech recognition.

BACKGROUND ART

The system for speech recognition or automatic interpretation in therelated art is inefficient to train many vocabularies or expressions invarious fields and therefore has been trained only for one region, thatis, a domain. In most cases, the speech recognition or automaticinterpretation application cannot correct the domain designated asdefault. Even when a user can directly select a domain, there areproblems in that it is inconvenient for a user to use the speechrecognition or automatic interpretation application and selectedcontents are very simple. Therefore, there are problems in thatadaptation to a speech recognition environment is degraded and accuracyof speech recognition and automatic interpretation is degraded.

SUMMARY OF THE INVENTION

The present invention has been made in an effort to increase accuracy ofspeech recognition and automatic interpretation by providing a userinterface helping to easily select a database, that is, a domainreferenced for speech recognition or automatic interpretation to a userand facilitating domain selection depending on the circumferences.

An exemplary embodiment of the present invention provides an apparatusfor speech recognition including: a display unit that displays a screenfor selecting a domain as a unit for a speech recognition regionpreviously sorted for speech recognition to a user; a user input unitthat receives a selection of a domain from the user; and a communicationunit that transmits the user selection information for the domain.

According to the exemplary embodiments of the present invention, it ispossible to provide the method of intuitively and simply selecting adomain to the user and improve the accuracy and performance of thespeech recognition and the automatic interpretation.

The foregoing summary is illustrative only and is not intended to be inany way limiting. In addition to the illustrative aspects, embodiments,and features described above, further aspects, embodiments, and featureswill become apparent by reference to the drawings and the followingdetailed description.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram illustrating a network for providing a speechrecognition service according to an exemplary embodiment of the presentinvention.

FIG. 2 is a diagram illustrating a configuration of a user terminalaccording to an exemplary embodiment of the present invention.

FIG. 3 is a diagram illustrating a structure and a relationship ofdesignation domains according to an exemplary embodiment of the presentinvention.

FIGS. 4 to 11 are diagrams illustrating a screen displayed on a displayunit of a user terminal according to an exemplary embodiment of thepresent invention.

FIG. 12 is a flow chart illustrating a method of designating a speechrecognition domain according to an exemplary embodiment of the presentinvention.

It should be understood that the appended drawings are not necessarilyto scale, presenting a somewhat simplified representation of variousfeatures illustrative of the basic principles of the invention. Thespecific design features of the present invention as disclosed herein,including, for example, specific dimensions, orientations, locations,and shapes will be determined in part by the particular intendedapplication and use environment.

In the figures, reference numbers refer to the same or equivalent partsof the present invention throughout the several figures of the drawing.

DETAILED DESCRIPTION

Hereinafter, exemplary embodiments of the present invention will bedescribed with reference to the accompanying drawings.

FIG. 1 is a diagram illustrating a network configuration for providing amethod for speech recognition according to an exemplary embodiment ofthe present invention.

A user terminal 10 receives speech and domain selection information froma user and transfers the received speech and domain selectioninformation to a speech recognition server 20. The user terminal 10 isequipped with communication functions such as a PC, a notebook, a smartphone, and the like, and may be any computing device that enables a userto input speech or texts.

The speech recognition server 20 performs speech recognition byreferring to a data corresponding to a domain selected by a user amongreference data for speech recognition stored in a DB 30 throughinformation on the received speech and selected domain. Next, theperformed speech recognition result is transmitted to the user terminal10.

Various data required for the speech recognition server 20 to performthe speech recognition operation are stored in the DB 30 and datareferenced during the speech recognition operation, for example, datasuch as corpus, a language dictionary, and the like, are stored in theDB 30 for each domain.

Hereinafter, the user terminal 10 will be described in more detail withreference to FIG. 2.

As illustrated in FIG. 2, the user terminal 10 according to theexemplary embodiment of the present invention may include a display unit100, a user input unit 200, and a communication unit 300.

The display unit 100 displays information necessary for speechrecognition and may display menus for designating a domain referencedfor speech recognition to a user. In the present exemplary embodiment,the speech recognition server 20 is a system that receives the speechsignal to recognize meanings and performs the speech recognition basedon a designation domain that is designated by a user or a generaldomain.

The general domain is a database that is referenced to support thespeech recognition for languages generally used rather than for aspecific domain and the designation domain is a database selectedautomatically or by a user for specific situations so as to support moreaccurate speech recognition than the foregoing general region. Forexample, when the input speech relates to touring, the speechrecognition may be performed by using a ‘touring’ domain as adesignation domain and the better speech recognition results may begenerated than the case in which the general domain is selected.

A concept of the speech recognition domain will be described in moredetail with reference to FIG. 3. According to the exemplary embodimentof the present invention, the speech recognition domain may be referredto as the database referenced in a unit classifying the region of thespeech recognition, that is, during the speech recognition process.

Referring to FIG. 3, as described above, the speech recognition server20 may be operated for a general domain 31 as default or by the userselection. Here, the speech recognition server 20 may have a first subdomain 32 as each designation domain and may have second sub domains 33using the first sub domain 32 as a parent domain. Although notillustrated, the speech recognition server 20 may include third andfourth sub domains by using the second sub domain as the parent domain.

The second sub domains may also substitute some characteristics (wordsor expressions) of the parent domain and may also have characteristicswhich the parent domain does not have. Each domain may also overlap eachother. For example, two sub domains, for example, a touring domain maypartially overlap a business domain that is another domain and arestaurant domain that is a sub domain of the touring domain maypartially overlap the business domain.

Hereinafter, a configuration of a screen for selecting a domaindisplayed on the display unit 100 according to the exemplary embodimentof the present invention will be described with reference to theaccompanying drawings.

In the present exemplary embodiment, the domain display unit 100displays a domain that can be selected or deselected by the user.Referring to FIG. 4, the display unit 100 may display the domain in ahierarchical structure or a tree structure. As illustrated in FIG. 4,each domain may be represented by a label that is a name of thedesignation domain among the domains.

The general domain is expressed by a label of a name called ‘General’and as the sub domain of the general domain, may include a touringdomain 47, a business domain, a conference domain 45, and a medicinedomain 46 as four sub domains.

The touring-related domain is expressed by a domain 47 having a label ofa name called ‘Touring’ and may include a restaurant domain, an airportdomain, and a car rent domain as three sub domains. The restaurantdomain is expressed by a domain having a label of a name called‘Restaurant’ and may include an additional sub domain according to kindsof restaurants. For example, in the case of FIG. 4, a Korean restaurantdomain is expressed by a label called ‘Korean Food’ and a Chineserestaurant domain is expressed by a label called ‘Chinese Food’.

The Korean restaurant domain may include language data for Korean foodsand a restaurant name so as to support the speech recognition.

The conference domain may be expressed by a label 45 of a name called‘Conference’ and may include a computer engineering domain and amechanical engineering domain as the sub domain. The computerengineering domain may be expressed by a label called ‘Computer Science’and the mechanical engineering domain may be expressed by a label called‘Mechanical Engineering’. In the case of the conference domain, the usefrequency of the professional vocabularies is more increased than otherfields to subdivide the speech recognition region according to theconference-related fields, thereby increasing the accuracy ofrecognition and interpretation when the designated speech recognitionservice is provided.

Hereinafter, in the present exemplary embodiment, the user input unit200 that receives the selection of the domain for selecting the domainfrom the user through the domain display unit 100 will be described.

Referring to FIG. 4, in the present exemplary embodiment, the displayunit 100 of the user terminal 10 makes the domain regions into a treestructure and displays the domain regions to the user and the user maydrag and drop 43 the domain 45 to be selected in a designation domaindisplay region 42 by a mouse or a touch gesture to select the domain. Adomain 44 previously selected in the designation domain display region42 is dragged and dropped outside the region to release the designation.

In this case, a domain tree may display or hide the sub domains by a ‘+’button 46 or a “−” button 47.

The domains previously selected among the nodes of the tree areexpressed differently to enable a user to avoid the unnecessaryre-selection. The general region of the selected designation domaindisplay region 42 is previously selected and therefore is displayeddifferently from other domains, thereby enabling the user to recognizethe fact.

Referring to FIG. 4, as the domain for the designation domain previouslyselected in the designation possible domain display region of thecurrent display unit, the ‘General’, ‘Touring’, ‘Restaurant’, and‘Korean Food’ are displayed 48 differently from the domain that is notselected, such that the selected domain may be informed to the user. Theselected domains are displayed on the designation domain display region42. Among them, the ‘General’ domain 48 is basically the selected domainand the selection thereof cannot be released. Therefore, the fact isinformed to the user by differently expressing 49 the ‘General’ domain48 from other selected domains.

FIG. 4 illustrates that the ‘Conference’ domain 45 is dragged 43 to thedesignation domain display region 42 so as for the current user todesignate the conference-related domain. In the present exemplaryembodiment, it is also possible to select the designation domain using amenu call by a mouse double click or a right click in addition to thedrag and drop of the label of the domain so as to select the designationdomain.

Referring to FIG. 5, it may be shown that the user selects thecorresponding domain by a type 52 of checking an empty check box 51through the user's click or touch, instead of the drag and drop type. Inthe case 53 of the general domain that should be selected at all times,the check is displayed by other colors, such that it may be informed tothe user that the domain is selected at all times.

FIG. 6 illustrates an example in which the user interface for selectingthe domain of the present invention is dynamically expressed. In FIG. 6,the tree structure illustrated in FIGS. 4 and 5 is more dynamicallyexpressed, such that the user may more easily operate the userinterface. The user clicks or touches the domain to be selected, suchthat the child nodes are shown as the sub domain of the correspondingdomain and can be selected 61. The domains of the designation domainthat is not selected are expressed 62 differently and are transmitted tothe user.

FIG. 7 illustrates an example of the user interface for designating thedesignation domains having the equivalent structure that does not have ahierarchical structure. Each designation domain is represented by icons.The label name of the foregoing domain is displayed at the lower end ofthe icon to inform the user of which domain the icon corresponds to. Inthe case of the icon form, FIG. 7 illustrates that all the icons aredisplayed in the same form, but it is also possible to configure a‘Medical’ icon in an intuitive form such as ‘+’ so as to inform a userof the intuitively corresponding domain.

A designation possible domain display region 74 is present in the screento inform a user of what the selectable domain region is and a selecteddesignation domain display region 71 is present at the lower end of thescreen to inform a user of what the designation domain that is currentlyselected an can be deselected. The user clicks or touches the domain 72corresponding to the designation domain to be selected as thedesignation possible region display region or drags and drops the domain72 to the selected designation domain display region 71 to select thedesignation domain. The selected designation domain may remove the iconsin the existing designation possible domain display region to avoid theunnecessary re-selection of the user. Similarly, the selection isreleased by clicking or touching the previously selected designationdomain or dragging and dropping the domain to the designation possibledomain display region. The released designation domain may be shown inthe designation possible domain display region 74 so as to be selectedagain.

In order for a user to easily access many designation domains,accessibility is increased by disposing a scroll bar 75. In the presentexemplary embodiment, icons displayed in the designation possible domaindisplay region are changed by the scroll operation are changed, but inthe case of the designation domain display region, may not be changedregardless of the scroll for the selection through the icons.

FIG. 8 illustrates an example of the user interface for the designationdomains having the hierarchical structure for application of FIG. 7.When the user selects the icons by clicking or touching the icons as adomain 81 for the designation domain wanting to see a lower designationdomain, the icons for a lower domain of the corresponding domain aredisplayed at a lower region display portion. A lower domain displayregion 82 creates a boundary so that the region 82 is shown by beingdifferentiated from upper designation domains. Icons 84 at this boundaryare dragged and dropped to a designation domain display region 83 or maybe selected as the designation domain using the click or touchoperation.

When the upper designation domain is selected/released upon releasingthe selection of the designation domain, the lower designation domainincluded in the upper designation domain may also be automaticallyselected/released to support the comprehensive selection/release.

In FIG. 8, when a user selects the icon 81 corresponding to the ‘Seoul’domain, the sub domains of the ‘Seoul’ domain, for example, ‘SeoulHotel’, ‘Seoul Restaurant’, and the like, are displayed. The user maytouch the ‘Seoul Hotel’ 84 among the displayed icons or may drag anddrop the ‘Seoul Hotel’ 84 to the designation domain display region 83 toselect the Seoul Hotel related domain as the designation domain.

The display unit 100 according to the present embodiment may use theuser information collected through the user terminal to check the usersituations and display the domain recommended according to the checkedsituation information to the user. FIG. 9 illustrates an example ofsuggesting the suitable designation domain by checking the usersituations.

The user information collected by the user terminal is a user's locationinformation through a global positioning system (GPS) embedded in theuser terminal, surrounding information through a camera, surroundingsound information recognized by a mike, and the like, and the usersituations are checked using the above information. Therefore, theapparatus for speech recognition according to the present exemplaryembodiment recommends the designation possible domain to the userthrough the information on the user situations. For example, Seoul ofKorea may be recommended as the designation domain through the GPS ofthe user and when it is recognized by the camera that there arerestaurants around the user, touring and restaurants can be recommendedas the designation domain. When plane landing and taking off sounds asthe surrounding sound are recognized by a mike, an airport may berecommended.

Therefore, in the present embodiment, the display unit 100 may emphasizethe domains of the recommended designation domains and display thedomains on the screen. It is possible to more rapidly and accuratelyobtain the speech recognition and automatic interpretation results byenabling the user to easily select only the necessary designation domainamong the designation domains to prevent the addition of the recognitionsupport data supporting the recognition of the unnecessary vocabulariesor sentence expressions. To the contrary, the designation domains thatare considered to be unnecessary have low availability by using thesituation information are obscurely displayed or are not displayed toassist the simple recognition of the user and prevent the unnecessaryselection.

Referring to FIG. 10, when the user location is recognized as Yeosu ofKorea through the current GPS information 101, an icon 102 (Yeosu Hotel,Yeosu Restaurant, Yeosu Expo) corresponding to the Yeosu related domainis emphasized and displayed and an icon 103 (Medical) corresponding tothe domain having low degree of association is obscurely displayed.

The domain display unit 100 according to the present embodiment can alsodisplay to the user the recognition data that is at least an example forexemplifying the speech recognition level designated according to theselection of the domain to the user.

Referring to FIG. 11, when the user wants to add the ‘Conference’related domain, in the case in which the ‘Conference’ domain is selectedas the designation domain by a recognition example portion 114, thelevel of the recognizable speech indirectly exemplifies that thesentence of the level like “Where is the nearest Gal-bi buffet from the8th Advanced Computing (115) Conference hall?” can be recognized,thereby assisting the selection of the designation domain by the user.

In the exemplary embodiment, the user input unit 200 receives the domainselected by the user through the domain selection screen displayed onthe display unit 100 and the communication unit 300 transmits theinformation on the selected domain to the speech recognition server 20.

The speech recognition server 20 performs the speech recognition byreferring to the data corresponding to the domain selected by a useramong the reference data for speech recognition stored in the DB 30through information on the selected domain received. Next, the performedspeech recognition result is transmitted to the user terminal 10.

The speech recognition server 20 according to the exemplary embodimentis described as a system that communicates with the user terminal 10 forspeech recognition to receive the results from the speech recognitionserver, but the speech recognition server 20 is a speech recognitionmodule in the terminal according to the system performance of the userterminal 10 and the DB 30 may be implemented as the internal memory. Inthis case, the communication unit 300 of the user terminal 10 maytransmit the information on the selected domain to the internal speechrecognition module rather than to the external speech recognition server20.

That is, in this case, the components of the user terminal 10, that is,the display unit 100, the user input unit 200, and the communicationunit 300 are operated as the interface module for the selection of thespeech recognition domain and the speech recognition server isinterlocked with the interface module so as to be implemented as thespeech recognition module that performs the speech recognition.

The user interface that can be easily understood and simply correct thedesignation domain is provided to the users through the apparatus 10 forspeech recognition according to the exemplary embodiment of the presentinvention to increase the adaptation for the changed environment,thereby improving the accuracy of the speech recognition and theautomatic translation. Hereinafter, the domain designation method usingthe apparatus 10 for speech recognition according to the exemplaryembodiment will be described.

Referring to FIG. 12, a method for designating a speech recognitionregion includes displaying a domain selection screen (S100), inputtingdomain selection (S200), and transmitting selection information (S300).

In the displaying of the domain selection screen (S100), the foregoingdisplay unit 100 displays the screen for selecting the domain as a unitfor the speech recognition region of the predetermined classificationfor the speech recognition designated by the speech recognition server20 to a user.

In the inputting of the domain selection (S200), the foregoing userinput unit 200 receives the selection of the domain from the user.

In the transmitting of the selection information (S300), the foregoingcommunication unit 300 transmits the selection information of the userfor the domain to the speech recognition server 20.

The detailed operations of each process of the foregoing regiondesignation method are the same as those described in the display unit100, the user input unit 200, and the communication unit 300 asdescribed above, and therefore the description thereof will be omitted.

The foregoing example mainly describes the operation for speechrecognition, but the operation for speech recognition is essential forthe automatic translation and therefore may also be applied to theautomatic translation. For example, the speech recognition server 10 ofFIG. 1 may be an automatic translation server.

Meanwhile, the embodiments according to the present invention may beimplemented in the form of program instructions that can be executed bycomputers, and may be recorded in computer readable media. The computerreadable media may include program instructions, a data file, a datastructure, or a combination thereof. By way of example, and notlimitation, computer readable media may comprise computer storage mediaand communication media. Computer storage media includes both volatileand nonvolatile, removable and non-removable media implemented in anymethod or technology for storage of information such as computerreadable instructions, data structures, program modules or other data.Computer storage media includes, but is not limited to, RAM, ROM,EEPROM, flash memory or other memory technology, CD-ROM, digitalversatile disks (DVD) or other optical disk storage, magnetic cassettes,magnetic tape, magnetic disk storage or other magnetic storage devices,or any other medium which can be used to store the desired informationand which can accessed by computer. Communication media typicallyembodies computer readable instructions, data structures, programmodules or other data in a modulated data signal such as a carrier waveor other transport mechanism and includes any information deliverymedia. The term “modulated data signal” means a signal that has one ormore of its characteristics set or changed in such a manner as to encodeinformation in the signal. By way of example, and not limitation,communication media includes wired media such as a wired network ordirect-wired connection, and wireless media such as acoustic, RF,infrared and other wireless media. Combinations of any of the aboveshould also be included within the scope of computer readable media.

As described above, the exemplary embodiments have been described andillustrated in the drawings and the specification. The exemplaryembodiments were chosen and described in order to explain certainprinciples of the invention and their practical application, to therebyenable others skilled in the art to make and utilize various exemplaryembodiments of the present invention, as well as various alternativesand modifications thereof. As is evident from the foregoing description,certain aspects of the present invention are not limited by theparticular details of the examples illustrated herein, and it istherefore contemplated that other modifications and applications, orequivalents thereof, will occur to those skilled in the art. Manychanges, modifications, variations and other uses and applications ofthe present construction will, however, become apparent to those skilledin the art after considering the specification and the accompanyingdrawings. All such changes, modifications, variations and other uses andapplications which do not depart from the spirit and scope of theinvention are deemed to be covered by the invention which is limitedonly by the claims which follow.

What is claimed is:
 1. An apparatus for speech recognition, comprising:a display unit that displays a screen for selecting a domain for speechrecognition to a user; a user input unit that receives a selection of adomain from the user; and a communication unit that transmits the userselection information for the domain.
 2. The apparatus of claim 1,wherein the display unit displays a domain selected by the user or adomain previously selected and deselected by the user.
 3. The apparatusof claim 1, wherein the display unit classifies and displays a domainrepresenting the domain into a layer according to a speech recognitionlevel.
 4. The apparatus of claim 3, wherein the display unit displays adomain for the domain selected by the user among the domains classifiedand displayed into a layer.
 5. The apparatus of claim 3, wherein thelayer according to the speech recognition level classifies a generalregion providing a basic speech recognition region according to ageneration situation of speech and the generation situation isre-classified according to generation places.
 6. The apparatus of claim3, wherein the display unit displays a domain indicating the domaincorresponding to a lower layer of the selected domain according to thedomain selection of the user.
 7. The apparatus of claim 1, wherein thedisplay unit uses user information collected by the user terminal tocheck the user situation and displays a domain recommended according tothe checked situation information to the user.
 8. The apparatus of claim1, wherein the display unit displays at least one exemplifiedrecognition data to a user for exemplifying a speech recognition leveldesignated according to the domain selection to the user.
 9. A methodfor speech recognition, comprising: displaying a screen for selecting adomain as a unit for a speech recognition region of a predeterminedclassification for speech recognition to a user; receiving a selectionof a domain from the user; and transmitting the user selectioninformation for the domain.
 10. The method of claim 9, wherein in thedisplaying of the screen for selecting the domain, a domain selected bythe user or a domain previously selected and deselected by the user isdisplayed.
 11. The method of claim 9, wherein in the displaying of thescreen for selecting the domain, a domain representing the domain isclassified into a layer according to a speech recognition level and isdisplayed.
 12. The method of claim 11, wherein in the displaying of thescreen for selecting the domain, a domain for the domain selected by theuser among the domains sorted and displayed into a layer is displayed.13. The method of claim 11, wherein the layer according to the speechrecognition level has a hierarchical structure in which a general regionproviding a basic speech recognition region is sorted according to ageneration situation of speech and the generation situation is resortedaccording to generation places.
 14. The method of claim 11, wherein inthe displaying of the screen for selecting the domain, a domainindicating the domain corresponding to a lower layer of the selecteddomain is displayed according to the domain selection of the user. 15.The method of claim 11, wherein in the displaying of the screen forselecting the domain, the user situation is checked using userinformation collected by the user terminal and a domain recommendedaccording to the checked situation information is displayed to the user.16. The method of claim 11, wherein in the displaying of the screen forselecting the domain, at least one exemplified recognition data forexemplifying a speech recognition level designated according to thedomain selection to the user is displayed to the user.
 17. A computerreadable recording medium in which programs executed on a computer arestored, comprising: displaying a screen for selecting a domain forspeech recognition to a user; receiving a selection of a domain from theuser; and transmitting selection information of the user for the domain.18. An apparatus for speech recognition, comprising: an interface modulethat displays a screen for selecting a domain for speech recognition toa user to receive a selection of a domain from the user and transmit theselection information of the user for the domain; and a speechrecognition module that refers to data corresponding to the domainselected by the user among reference data for speech recognition throughthe received selection information of the user to perform the speechrecognition.